Mask estimation and sparse imputation for missing data speech recognition in multisource reverberant environments
نویسندگان
چکیده
This work presents an automatic speech recognition system which uses a missing data approach to compensate for environmental noise. The missing, noise-corrupted components are identified using binaural features or a support vector machine (SVM) classifier. To perform speech recognition using the partially observed data, the missing components are substituted with clean speech estimates calculated using sparse imputation. Evaluated on the CHiME reverberant multisource environment corpus, the missing data approach significantly improved the keyword recognition accuracy in moderate and poor SNR conditions. The best results were achieved when the missing components were identified using the binaural features and the clean speech estimates associated with observation uncertainty estimates.
منابع مشابه
Mask estimation and imputation methods for missing data speech recognition in a multisource reverberant environment
We present an automatic speech recognition system that uses a missing data approach to compensate for challenging environmental noise containing both additive and convolutive components. The unreliable and noisecorrupted (“missing”) components are identified using a Gaussian mixture model (GMM) classifier based on a diverse range of acoustic features. To perform speech recognition using the par...
متن کاملMask estimation in non-stationary noise environments for missing feature based robust speech recognition
In missing feature based automatic speech recognition (ASR), the role of the spectro-temporal mask in providing an accurate description of the relationship between target speech and environmental noise is critical for minimizing the degradation in ASR word accuracy (WAC) as the signal-to-noise ratio (SNR) decreases. This paper demonstrates the importance of accurate characterization of instanta...
متن کاملObservation uncertainty measures for sparse imputation
Missing data imputation estimates the clean speech features for automatic speech recognition in noisy environments. The estimates are usually considered equally reliable while in reality, the estimation accuracy varies from feature to feature. In this work, we propose uncertainty measures to characterise the expected accuracy of a sparse imputation (SI) based missing data method. In experiments...
متن کاملSeparating Speech From Noise Challenge
We have used the data from the PASCAL CHiME challenge with the goal of training a Support Vector Machine (SVM) to estimate a noise mask that labels time-frames/frequency-bins of the audio as 'reliable' or 'unreliable'. This noise mask could be used by another block in the signal processing pipeline to treat the unreliable data as missing and then replace the missing data with an estimate of the...
متن کاملNoise robust digit recognition using sparse representations
Despite the use o f noise robustness techniques, automatic speech recognition (ASR) systems make many more recognition errors than humans, especially in very noisy circumstances. We argue that this inferior recognition performance is largely due to the fact that in ASR speech is typically processed on a frameby-frame basis preventing the redundancy in the speech signal to be optimally exploited...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011